Taxonomy, comparative genomics of Mullein (Verbascum, Scrophulariaceae), with implications for the evolution of Verbascum and Lamiales

Background The genus Verbascum L. (Scrophulariaceae) is distributed in Africa, Europe, and parts of Asia, with the Mediterranean having the most species variety. Several researchers have already worked on the phylogenetic and taxonomic analysis of Verbascum by using ITS data and chloroplast genome fragments and have produced different conclusions. The taxonomy and phylogenetic relationships of this genus are unclear. Results The complete plastomes (cp) lengths for V. chaixii, V. songaricum, V. phoeniceum, V. blattaria, V. sinaiticum, V. thapsus, and V. brevipedicellatum ranged from 153,014 to 153,481 bp. The cp coded 114 unique genes comprising of 80 protein-coding genes, four ribosomal RNA (rRNA), and 30 tRNA genes. We detected variations in the repeat structures, gene expansion on the inverted repeat, and single copy (IR/SC) boundary regions. The substitution rate analysis indicated that some genes were under purifying selection pressure. Phylogenetic analysis supported the sister relationship of (Lentibulariaceae + Acanthaceae + Bignoniaceae + Verbenaceae + Pedaliaceae) and (Lamiaceae + Phyrymaceae + Orobanchaceae + Paulowniaceae + Mazaceae) in Lamiales. Within Scrophulariaceae, Verbascum was sister to Scrophularia, while Buddleja formed a monophyletic clade from (Scrophularia + Verbascum) with high bootstrap support values. The relationship of the nine species within Verbascum was highly supported. Conclusion Based on the phylogenetic results, we proposed to reinstate the species status of V. brevipedicellatum (Engl.) Hub.-Mor. Additionally, three genera (Mazus, Lancea, and Dodartia) placed in the Phyrymaceae family formed a separate clade within Lamiaceae. The classification of the three genera was supported by previous studies. Thus, the current study also suggests the circumscription of these genera as documented previously to be reinstated. The divergence time of Lamiales was approximated to be 86.28 million years ago (Ma) (95% highest posterior density (HPD), 85.12–89.91 Ma). The complete plastomes sequence data of the Verbascum species will be important for understanding the Verbascum phylogenetic relationships and evolution in order Lamiales. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08799-9.


Introduction
Scrophulariaceae family, generally known as the "figwort" consist of angiosperm plants that are herbs and with one genus of shrubs [1]. The family has about 62 genera and approximately 1830 recognized species [2]. The phylogenetic relationship within Scrophulariaceae remains unresolved topic in angiosperm systematics [3]. To date, only chloroplast plastomes of three genera; Buddleja L. [4], Scrophularia L. [5], and Verbascum L. [6,7], have been studied in this family. The genus Verbascum (Scrophulariaceae, Lamiales), commonly recognized as "Mullein", comprises about 360 species that are widely distributed in temperate regions in Europe, Africa, and Asia [8][9][10][11]. The main diversity hotspot for this genus is Turkey with 235 existing species [12]. The genus Verbascum is still poorly understood, and new species are described regularly, particularly in Turkey [13]. The taxonomy of Verbascum has been a source of contention, and it varies depending on the treatment. This genus is morphologically characterized by rodulate, biennial, or perrenial herbs with yellow-flowered, thyrsic, or racemose inflorescences [1,14]. Verbascum extracts, decoctions, and infusions have been utilized in traditional medicine for a long time. The leaves have been used as a diuretic, sudorific, expectorant, sedative and the flowers have mucolytic and expectorant properties [15][16][17].
The generic circumscriptions, and placement of the genus Verbascum is a challenge [18], and this has been the focus of contentious debate among the researchers. The initial description and classification of Verbascum was done by Linnaeus [19], grouped species with 4 stamens in the genus Celsia L. and species with 5 stamens in the genus Verbascum. However, Fischer and Meyer [20] separated Verbascum nacolicum to a new genus Staurophragma because of the oblong cylindrical capsule, a distinguishable feature that he observed. In addition, species which were grouped under the genus Celsia were considered as Scrophularia by wilder [21], Allonsona by (Rui and Pav. 1786), Janthe and Thapsandra by Griseb., in 1844, Triguera by Dunal (1852), and lastly Alects by Schintz (1889). Bentham's treatment adopted the classification of Celsia and Verbascum by Linneaus's, based on the presence of stamens [22]. Kuntze (1891) combined the genus Celsia with Verbascum, based on overall resemblance [23]. However, Murbeck (1925) distinguished Celsia and Verbascum by the number of stamens, the presence of a sessile or stipitate placenta, and the number of blooms in each bract [24]. Distinguishing the species based on morphological features (four or five stamens) was found unreliable. Thus, two genera Celsia and Staurophragma were transferred to Verbascum [18,25,26]. The findings of molecular phylogenetic investigations later verified the classification [13]. Verbascum's systematics has had little attention in contemporary to Scrophulariaceae research; hence, it isone of the least understood in terms of infrageneric categorization. Murbeck (1933), separated this genus into two sections using seed morphology, Aulacospermae Murb. with a longitudinally grooved seed, and Bothrospermae (Murb.) Kamelin with diagonally grooved and alveolate seeds [27,28]. Huber-Morath (1978) presented an alternate classification system for Verbascum in his review of the Turkish members of the genus [29]. In his treatment of the genus, he erected 13 artificial groups A to M among 243 Verbascum species (including 129 hybrids) and claimed that all Turkish Verbascum species belonged to sect. Bothrospermae Murb. In another regional analysis published for the U.S.S.R separated Verbascum and Celsia and documented 51 species [28]. In 1981, another regional classification published for the Iranian plateau, merged the genus Celsia with Verbascum, but the classification did not suggest any infrageneric classification for the fortynine species and 4 additional hybrids [29], while classification for Europe documented nearly a hundred species of Verbascum, including Celsia [25].
For the Verbascum species distributed in East Africa, one species is only recognized from the region V. brevipedicellatum and one new genus Pelidium from the family Scrophulariaceae was documented [30][31][32], while V. sinaiticum is distributed in Africa and Arabian Peninsula. Verbascum brevipedicellatum is a perennial herb that is well defined morphologically [31]. Currently, the regional flora of Somalia and plants of the world online database (https:// powo. scien ce. kew. org/) shows the infrageneric classification of the V. brevipedicellatum first published in 1973 was a synonym of Rhabdotosperma brevipedicellata (Engl.) Hartl [33][34][35]. However, in the Flora of Tropical East Africa, R. brevipedicellata is documented as a synonym of V. brevipedicellatum with a note description that it is an extremely variable species, and field studies would be necessary to study the morphological variations among populations. Currently, with the material studied, it was difficult to distinguish the species, even to classify them to lower infraspecific ranks that have been described around this species. Though documented as a synonym of R. brevipedicellata in the plants of the world online (https:// powo. scien ce. kew. org/). It is still unclear if the recognition of this species is warranted or supported by other lines of evidence, such as phylogenetic data.
Recently, molecular analyses have been conducted to understand the phylogeny of the genus Verbascum. The first study was conducted using a nuclear internal transcribed spacer (ITS) and three non-coding chloroplast DNA regions (trnY-T, psbA-trnH, and trns-G) using 41 taxa representing two different genera [13]. This study supported the monophyly of the genus Verbascum, which agreed with the previous morphological studies of Scrophulariaceae [27]. However, the study established no supportive evidence for the subgeneric classification of the genus Verbascum. Previuous molecular analysis conducted by Ghahremaninejad et al. [13] suggested that the genus could be monophyletic. Sotoodeh [36], further investigated the genus in Iran and his finding was similar. However, recent studies using plastomes of V. phoeniceum L. and V. chinense L. and twenty-eight species in the order Lamiales revealed the placement of the genus Verbascum [6,7]. Bi et al. [6], showed that Verbascum formed a monophyletic branch sister to Scrophularia while another study by He et al. [7], placed Verbascum as sister to Scrophularia. Buddleja formed a sister clade to (Verbascum + Scrophularia). Besides, the study by He et al. [7], was not in agreement with the previous studies which indicated the monophyly of the genus. Therefore, the classification of this genus within the family Scrophulariaceae is controversial, and more molecular data is required to solve the existing problem. In comparison to chloroplast DNA markers, plastome studies provide more comprehensive genetic data [5,37,38]. Plastomes have been recently used as a useful tool in phylogenetic studies and genetic diversity [39][40][41]. Additionally, the variable regions of plastomes can be used as molecular markers for future phylogenetic and genetic diversity studies [41]. In this study, we sequenced and analyzed the whole plastomes of V. sinaiticum, V. brevipedicellatum, V. thapsus, V. songaricum, V. blattaria, V. chaixii, V. phoeniceum including two additional sequences from NCBI (NC050920 and MT610040). The main goals of this study are (i) to annotate and compare the plastomes of the eight species of genus Verbascum; (ii) to confirm if the genus Verbascum is monophyletic or nested within Scrophularia; (iii) to evaluate the infraspecific classification of Verbascum brevipedicellatum to Rhabdotosperma brevipedicellatum (Engl.) Hartl using phylogenetic analysis; (iv) to identify the fast-evolving cpDNA markers for species identification, phylogenetic construction, and phylogeographic studies in future; (v) to understand the evolutionary relations of Verbascum species and other species in order Lamiales.

Plastomes features
Genome sequencing was performed using the Illumina paired-end technology platform at the Novogene Company (Beijing, China), the total genomic DNA of seven (one accession (V. phoenecium) is similar to the one available in the NCBI) Verbascum species were sequenced ( Table 1). The plastomes of these seven species were typical circular double-stranded structures and ranged from 153,014 bp in V. blattaria to 153,291 bp in V. chaixii (Fig. 1). The plastome sequences of the seven Verbascum species were deposited in GenBank (Table 1). All seven plastomes show a quadripartite structure, including LSC region (84,263 -84,833 bp) and SSC region (17,884 bp), which are separated by a pair of IRs (25,426to 25,467 bp) regions. Comparative analysis among the seven species plastomes showed that V. chaixii was the largest in size compared to others. All plastomes encoded 114 unique genes, comprising 80 protein-coding genes, 30 tRNA genes, and four rRNA genes ( Fig. 1, Table 2).

Repeat analysis
Repeat sequences in the seven sequenced Verbascum species were analyzed using Tandem and REPuter. A total of 209 tandem repeats were detected in all seven Verbascum species (Fig. 2 (Fig. 2). In the MISA analyses, a total of 348 microsatellites regions (> = 10 bp) were identified in the seven plastomes (Table 2). Each Verbascum species plastome was found to contain 45-53 SSRs, of which 53 SSRs were shared among the three plastomes, 49 SSRs in two species. No pentanucleotides motifs were found in seven Verbascum species while hexanucleotide motifs were only found in V. blattaria. Among these repeats, mononucleotides motifs were the most abundant and most SSRs contributed to the AT richness in all seven Table 2 Gene composition in seven Verbascum species ‡Genes comprising two introns. †Genes with a single intron. Ψ Pseudogene. (× 2), genes duplicated in the inverted repeat regions

Comparative analysis
The Verbascum plastomes divergence results showed that IR regions have higher similarity compared to the single copy (SC) regions. More genes were conserved in the coding regions than in the noncoding regions, which is a common phenomenon in most angiosperms [42]. Significantly, the most conserved regions were observed across all species in the tRNA and rRNA regions. High variation was observed in the intergenic spacer regions of trnH-GUG-psbA, atpH -atpI, rps16 -trnQ-UUG, petN -psbM, psaA -ycf3, ycf4 -cemA. Non-coding regions in ccsA -psaC, rpl32-trnL-UAG reported high divergence in the SSC. The coding regions with the highest variation include accD, rpoC2 and matK. Divergence was also detected in introns of trnK-UUU, rps16, ycf3, petD, rpl16, clpP, rps12 and ndhA. These are regions of rapid evolutionary changes and therefore are essential sites for the development of molecular markers that could be useful in population genetics and phylogenetic studies.

Divergence hotspot identification
To estimate the divergence of among the Verbascum species, a total of 5 cp genome sequences of Verbascum were chosen to calculate nucleotide diversity (Pi), including V. sinaiticum, V. thapsu, V. brevipedicellatum, V. phoeniceum andV. chinense. Based on this analysis, we identified five remarkably divergent regions among the fiveplastomes, which were higher than the 0.012 Pi value (rps16-trnQ-UUG, rpl32-trnL-UAG, ndhD-psaC, trnH-GUG, and petD ( Fig. 4)). Gene trnH-GUG was the most divergent region with the highest Pi value of (0.013) and is located in the LSC region. These highly divergent regions could be used as potential molecular markers for phylogenetic reconstruction of the genera Verbascum. Overall, the result of this study revealed that sequence divergence was concentrated in the LSC and SSC regions, whereas IR regions presented less divergence, consistent with the mVISTA results ( Fig.  S1).

Inverted repeats
The eight Verbascum plastomes compared LSC/IRs and IRs/SSC borders and their adjacent genes (Fig. 5). The rps19 gene was located in all Verbascum species in the LSC/IRb region. In V. brevipedicellatum rps19 gene prolonged to the IR with 43 bp and V, chinense with 41 bp. While in the other remaining six Verbascum species rps19 gene extended with 37 bp in length (Fig. 5).
The ycf1 gene was situated at the IRb/SSC border junction, extending 824 bp into the IRb region in all Verbascum species. The ndhF gene was situated at the junction of SSC/IRa, extending 2 bp into the IRa region. Overall, the structure and gene content of the eight plastomes s were consistent, no significant expansion or contraction of IR regions were found in the Verbascum species.   [43]. Purifying selection is common in many protein-coding regions [44]. In this study, 90% of the genes had a Ka/Ks ratio of less than 1. Among the extracted regions, the genes with the highest Ka/Ks variability can be used as candidate barcodes to differentiate among Verbascum species and in the future applied to perform phylogenetic and phylogeography analysis.
To further verify the results, we calculated the non-synonymous (dN) and synonymous (dS) substitution rates. dN/dS method was used to compare in the selection analysis to check for putative bias for the same functional protein-coding sequences in 80 genes of seven species, including V. brevipedicellatum, V. sinaiticum, V. thapsus, V. phoeniceum, V. chinense, B. colvilei, and S. henryi in EasyCodeML software [45]. The ratio (ω = dN/dS) were calculated based on four site-specific models (M0 vs. M3, M1a vs. M2a, M7 vs. M8, and M8a vs. M8) with likelihood ratio test (LRT) threshold of p < 0.05 elucidating adaptation signatures within the genome. Among the four models, the comparative LRT of M7 vs. M8 was positive in determining the chi-square p-value < 0.05 and the selection strength. Bayes Empirical Bayes (BEB) [46] analysis was implemented in model M8, two sites were detected as the site of positive selection which represented one photosynthesis-related gene ndhF, and hypothetical gene ycf1 (Table 4).

Phylogenetic analysis
To gain a further insight into the phylogenetic position within Verbascum species and their relationship to other closely related species and other families in order Lamiales, two data sets including 80 coding sequence (CDs) genes and nrDNA (ITS1 + 5.8S + ITS2) sequences were aligned to construct the phylogenetic tree. A phylogenetic study using 80 coding sequence (CDs) genes formed a well-supported tree. The Maximum Likelihood (ML) and Bayesian Inference (BI) inferences produced similar trees with a high support value (95% of nodes with bootstrap support > 90 in ML, 97% of nodes with bootstrap support > 0.9 in BI) (Fig. 6). The families, Scrophulariaceae, Oleaceae, Gesneriaceae, and Plantaginaceae were the first basal angiosperms of Lamiales which branched early. Within the higher core, Lamiales (Lentibulariaceae + Acanthaceae + Bignoniaceae + Verbenaceae + Pedaliaceae) formed a sister clade with (Lamiaceae + Mazaceae + Orobanchaceae + Paulowniaceae + Phyramaceae) families (Mazaceae noted with steric because currently is treated differently within Phyramaceae and in our study, it formed a sister clade to Lamiaceae). Within Scrophulariaceae, Verbascum formed a sister clade to Scrophularia, while the genus Buddleja formed a monophyletic clade. Within Verbascum, V. blattaria formed a sister clade to (V. chinense + V. phoeniceum) + (V. brevipedicellatum + V. thapsus + V. sinaiticum + V. songaricum + V. chaixii) with high bootstrap support values indicating they are closely related and belong to the same genera. In addition, the phylogenetic analysis based on the nrDNA (ITS1 + 5.8S + ITS2) sequence within Scrophulariaceae (Verbascum + Scrophularia + Buddleja) suggested that V. brevipedicellatum formed a sister clade to V. phoeniceum + (V. nudicaule + Verbascum sp.) with high bootstrap support  values indicating they are closely related and belong to the genus Verbascum (Fig. 7).

Comparative plastome analysis
Plastomes are used in taxonomic and evolutionary studies to assess evolutionary relationships and compare genome structure, particularly closely related species. Comparative study of Verbascum plastomes indicated that they are highly conserved. The complete plastomes of seven Verbascum species were quadripartite in structure and composed of the LSC, SSC, and (IRa/IRb) repeat regions [47,48]. The plastome sizes of seven Verbascum species ranged from 153,014 to 153,481 bp. The total gene content and arrangement of seven species of Verbascum were almost similar. The seven species confined the equal number of CDs, rRNA, and tRNA genes. Due to the low substitution rate in their genomes and their recent estimated time divergence, this kind of conservatism is common. Previous studies performed in different closely related taxa, for example, tribes and genera have reported similar findings [49,50].
The complete plastomes of most angiosperm plants are highly conserved, but contraction and expansion of the SC and IR regions are believed to be the major cause of plastome size variations [51,52]. For instance, inversions, loss of genes, absence of IR region, contraction/ expansion in IR, and duplication of the rps19 gene in the IR have been reported in different species [53,54]. In this study, we observed that the boundary region between the two inverted repeats and single copy region were highly conserved, and gene distribution and location specification were consistent. Additionally, a comparison of genes near the IR region of eight different Verbascum species showed that genes exhibit different degrees of contraction/expansion at the boundary of the IR region. These results are consistent with previous conclusions, which showed that most angiosperm plants plastomes are usually highly conserved, but little differences occur due to contraction and expansion of the IR and SC junction regions [55][56][57].

Repeat analysis
Normally, most repeats are found dispersed in noncoding regions and within the ycf2 gene [58]. Comparative analyses of different complete plastomes have indicated that repeated motifs are the factors that cause gene deletion, insertion, and replacement [59,60]. The repeat analyses of the seven Verbascum species detected 264 repeats, most of which are 30-41 bp long, with 7 repeats of 44 bp, 52 bp, and 61 long. Among the seven Verbascum species, V. chaixii contained the largest number of repeated sequences. Plastome combination and sequence differences mostly occur because of untimely recombination of repetitive sequences and slipped-strand mismatches [56]. These repeats are the basis of genetic markers for phylogenetic and population analyses, being applied widely because of their highly polymorphic and simplicity to be amplified [61,62]. In this analysis, 348 simple sequence repeat motifs were found, most of them were poly-A and T as previously supported by other studies [63,64]. In addition, dispersed repeats have been increasingly recognized as a potential genetic variation and regulation source. The repeats found in seven Verbascum species indicate genetic variation among the species. Together with these regulatory roles, a structural role of repeated DNA in shaping the 3D folding of genomes has also been proposed [65].

Selective pressure analysis
The synonymous (Ks) and non-synonymous (Ka) substitution rates and their corresponding ratio (Ka/Ks) also known as (dN/dS) have been used widely in calculating nucleotide evolution rates and natural selection pressure [46]. Generally, studies done previously indicated that Ka/Ks ratios mostly are lower than one [66], due to synonymous nucleotide substitutions rates that occur more often compared to nonsynonymous substitutions rates. The genes with the highest Ka/Ks variability can be used as candidate barcodes to differentiate species and in the future applied to perform phylogenetic and phylogeographic analyses. In this study, 15 extracted regions (atpI, atpH, CssA, matK, ndhE, psaJ, psbE, psbN, psbT, rpl23, rps14, rps19, ycf15, and trnH-psbA) were above 1 indicating positive selection.
To further confirm the results, we calculated the nonsynonymous (dN) and synonymous (dS) substitution rates using EasyCodeML [45] of seven species, including V. brevipedicellatum, V. sinaiticum, V. thapsus, V. phoeniceum, V. chinense, B. colvilei, S. henryi within Scrophulariaceae species. We identified two sites that were  (Table 4). All 17 sites can be used to differentiate species and applied in phylogenetic and phylogeography analyses of the genus Verbascum. The varying results of the Ka/Ks ratio obtained in our study are evidence that evolutionary rates of chloroplast genomes vary among genes. This supported by the previous conclusions drawn by Manezes et al., (2018) [67].

Phylogenetic analysis
The sequencing of complete plastomes has increased recently due to the advancement in sequencing technology [68]. Plastomes are important in molecular, phylogenetic, and evolutionary studies as well as solving relationships in plants and can be applied to plants DNA barcodes [69]. The phylogenetic relationships within Scrophulariaceae is challenging to taxonomists, these difficulties in phylogenetic reconstruction of Scrophulariaceae are likely due to reticulate evolution caused by hybridization and polyploidization [5,13]. Phylogenetic analysis using 80 protein-coding genes produced a well-supported phylogenetic tree (Fig. 6). The first basal angiosperm classification of Lamiales (Oleaceae, Gesneriaceae, Plantaginaceae, and Scrophulariaceae) was confirmed, and the other families (Phyrymaceae + Orobanchaceae + Paulowniaceae) which were in concordance with the previous studies [5,[70][71][72][73]. However, the most problematic of the families' placement in the previous studies were (Lentibulariaceae + Verbenaceae + Schelegeliaceae + Bignoniaceae + Pedaliaceae + Acanthaceae + Thormandersiaceae + Martyniaceae), which had a support value of (BIPP/MLBS 1/48), but the clade was unsolved in the analyses by Schäferhoff et al. [70], however, a study by Rodriguez & Olmstead et al. [73] found that these families formed a separate clade with support values (BIPP/MLBS/ MPBS = 1/91/ < 5, 0.77/25/ < 5, 0.99/58/12). Moreover, Xu et al., [5] study was in agreement with the placement of these families ((Bignoniaceae + Verbenaceae) + Pedaliaceae) + (Acanthaceae + Lentibulariaceae) with a study done by Schäferhoff et al. [70], with high clade support values of (BIPP/MLBS = 1/97) using 80 protein-coding genes. In a study by Bi et al. [6] and He et al. [7], using complete cp genomes, the families formed a clade that was congruent with Xu et al. [5]. However, this study using 80 protein-coding genes with 89 taxa to infer the placement of these families (Lentibulariaceae + Acanthaceae) + Pedaliaceae) formed a sister branch to (Verbenaceae + Bignoniaceae) with high bootstrap clade value of (ML/BI 96/0.971). This may be attributed to fewer data used previously to infer the placement of these families. Moreover, different methods give different topologies, and the genome sequence variation within different species may result in this difference. Within all higher core, Lamiales (Lentibulariaceae + Acanthaceae + Bignoniaceae + Verbenaceae + Pedaliaceae) formed a sister clade to (Lamiaceae + Phyrymaceae + Orobanchaceae + Paulowniaceae + Mazaceae* (Mazus, Lancea, and Dodartia) families, (Mazaceae noted with steric because currently it is treated separately differently within Phyramaceae, in this study, it formed a sister clade with Lamiaceae). The results of this study confirm the results of a previous study that formed a separate clade [74], and consistent with the AGP IV classification of the Mazaceae family [75]. Therefore, in this study, we suggest the classification of the Mazaceae family to be reinstated back to formal recognition as suggested by James L. Reveal [76]. Interestingly, the relationships of the genera within the family Scrophulariaceae is well solved and confirmed.
In the plastomes of 80 CDS tree species, within Verbascum, V. blattaria formed a sister clade to (V. chinense with high bootstrap support values indicating they are closely related and belong to the same genera. But the species from Kenya and China were not well separated into two monophyletic lineages. Even two species from Kenya, V. sinaiticum and V. brevipedicellatum, were clustered with species from China, V. sinaiticum was instead more closely related to V. thapsus from Yunnan Province, China (Fig. 6). This suggests that research on the origin and biogeography of Verbascum species is still underexplored. In addition, the phylogenetic analysis based on nrDNA (ITS1 + 5.8S + ITS2) sequence within Scrophulariaceae (Verbascum + Scrophularia + Buddleja) suggests that V. brevipedicellatum formed a sister clade to V. phoeniceum + (V. nudicaule + Verbascum sp.) with high bootstrap support values indicating they are closely related and belong to the genus Verbascum (Fig. 7). However, in our study V. macrocarpum (KP738154) did not cluster into the genus Verbascum, but formed a monophyletic branch, which conforms with the previous study that the species are unresolved [13]. More chloroplast genome data on V. macrocarpum may be needed to solve this problem. In this study, our crown age estimation of Lamiales of 86.26 (Ma) (HPD% 85.1-89.9 Ma) was in concordance with the previous studies divergence time estimation (95% HPD, 67-101 Ma, 95% HPD, 70-99.8 Ma) [5,[77][78][79]. We selected two reliable fossils that represent the old dating period of order Lamiales as constraints for calibrating the age of our angiosperm tree from the well-reported previous analyses [5]. However, Menispermaceae and epiphytic ferns time divergence was estimated around the cretaceous and Paleocene periods which is linked to the development of tropical rain forest angiosperms [80,81]. The study provides a more current, comprehensive and detailed framework to the evolution of the family Scrophulariaceae and other families in order Lamiales with additional data of complete chloroplast plastomes.

Taxonomic treatment of Verbascum brevipedicellatum
Our phylogenetic results of nrDNA sequences and plastomes showed Verbascum brevipedicellatum formed a sister relationship with other Verbascum species. Although several names have been associated with this species in East Africa. In the FTEA it was documented as: Verbascum Verbascum is a Linnean genus [19], while Rhabdotosperma has been designated by Hartl [82], although in the flora of Somalia Rhabdotosperma is the conserved name, so we consider it as superfluous because according to article 14.5 of the International Code of Nomenclature for algae, fungi, and plants [83], specifies that if a conserved name competes with an earlier name against which it has not explicitly been conserved, that the early name is adopted. Therefore, in this regard, Verbascum has priority over Rhabdotosperma and also over Celsia. Additionally, based on the phylogenetic results and type specimens checked, we proposed to reinstate the species status of Verbascum brevipedicellatum (Engl.) Hub.-Mor.

Conclusion
In this study, the plastomes of seven Verbascum species (including all the Verbascum species distributed in China and two from Kenya) were sequenced and analyzed, determined their phylogenetic placement as well as their dating in the order Lamiales. The chloroplast plastomes indicated that gene organization, GC content, and plastome size are highly conserved. Tandem repeats were highest in V. chaixii followed by V. sinaiticum and then V. phoeniceum (> 31). The other remaining five Verbascum species recorded the lower number of tandem repeats (< 31), while simple sequence repeats were highest in V. chaixii, V. sinaiticum and then V. phoeniceum respectively. The phylogenetic analysis based on 89 taxa indicated that Verbascum formed a sister clade to Scrophularia. The Scrophulariaceae family diverged at 24.26 Ma (node 9) (95% HPD 23.10-25.67) from Pedaliaceae. The three Verbascum species diverged at 9.35 Ma (95% HPD 8.77-10.00) from Scrophularia. Classification of V. brevipedicellatum to Rhabdotosperma was not supported by our phylogenetic analysis hence suggesting the reinstatement of the species name. Additionally, we suggest the reinstatement of the families Phyramaceae and Mazaceae because they formed separate clades with high bootstrap support values. Notably, their divergence period was also different; Mazaceae diverged early compared to Phyramaceae species. This study is essential because it indicates the relationship of various families as well as the divergence time estimate in order Lamiales. In addition, phylogenetic results of nrDNA sequences and plastomes supported Verbascum brevipedicellatum formed a sister clade with othe Verbascum species.
Total genomic DNA was isolated using an improved Cetyltrimenthylammonium bromide (CTAB) method [84]. DNA was quantified through fluorometry using NanoDrop spectrometer or microplate reader, visualized in a 1% agarose-gel electrophoresis for the quality check. A total amount of 1.5 μg DNA per sample was used as input material for the DNA sample preparations. Sequencing libraries were generated using Truseq Nano DNA HT Sample Preparation Kit (Illumina USA) following the manufacturer's recommendations, and index codes were added to attribute sequences to each sample. Briefly, the DNA sample was fragmented by sonication to a size of 350 bp, then DNA fragments at the end were polished, A-tailed, and ligated with the full-length adapter for Illumina sequencing with further PCR amplification. At last, PCR products were purified (AMPure XP system) and libraries were analyzed for size distribution by Agilent2100 Bioanalyzer and quantified using real-time PCR. Results were then sequenced based on the Illumina paired-end technology platform at the Novogen Company (Beijing, China). In addition to this data, the chloroplast genomes of Verbascum phoeniceum (NC_050920) and Verbascum chinense (MT610040) and other species used in this study were downloaded from NCBI as well as sequences for phylogenetic analysis ( Table 5). The nrDNA (ITS1 + 5.8S + ITS2) used in this study were also downloaded from NCBI (Table 6).

Plastome and nrDNA assembly, annotation
The high-quality reads were used for de novo assembly to reconstruct Verbascum chloroplast genomes using GetOrganelle v.1.7.2. with wordsize of 150 and K-mer sizes of 105 [85][86][87][88]. The visualization of the final assembled graphs was done using Bandage to authenticate the produced plastid plastome [89]. The quality of the newly assembled plastomes was confirmed according to the reading level by aligning the trimmed raw reads to the de novo assemblies using Geneious mapper, Geneious prime 2021 [90], with medium-to low-sensitivity option and iteration up to five times [91]. The annotation of the sequences was performed using CpGAVAS2 [92]. The online blast software version 2.2.25 (https:// blast. ncbi. nlm. nih. gov/ Blast. cgi) software was used to match the cp plastomes CDs sequences on NCBI and manual edit them correctly. The tRNAscan-SE software was used to annotate tRNA [93]. The complete circular plastomes were mapped using the OGDRAW program (https:// chlor obox. mpimp-golm. mpg. de/OGDraw. html) [94], and the complete genomes were deposited in the Gene bank, Accession numbers: V. sinaiticum (MW751818),

Comparative analysis
To compare the structural differences and similarities between Verbascum species, the mVISTA online tool was applied to analyze the five representative species (V. sinaiticum, V. phoeniceum, V. brevipedicellatum, Verbascum chinense, V. thapsus) plastome sequences with the Shuffle LAGAN model [98], using the reference of the annotation of V. chinense. The chloroplast online tool was used to indicate the comparison between the boundaries between, Large Single regions (LSC), Small Single copy (SSC), and the inverted repeats (IR) regions among the eight (V. sinaiticum, V. phoeniceum, V. brevipedicellatum, Verbascum chinense, V. thapsus, V. chaixii, V. songaricum, V. blattaria) complete cp plastomes. To explore the highly divergent regions of the cp genomes within eight Verbascum species, the software DnaSP version 6.12.03 [99] was used to calculate nucleotide diversity (Pi). The step size and window length were set as 200 bp and 600 bp, respectively. Geneious prime 2021 [90] was used to detect the contraction/expansion of the inverted repeat regions (IRs), and the final graph of expansions/ contractions was visualized using Adobe Illustrator.

Selective pressure analysis
To analyze the substitution rate within Verbascum species, a total of 119 regions were extracted from the three representative species (V. sinaiticum, V. brevipedicellatum, V. thapsus) cp plastomes. Geneious prime was used to extract the regions. Stop codons were cut and removed manually, and the gaps were removed within the sequences using the Gblocks server online. Mega 7 was used to align combined files after the removal of stop codons and gaps. We converted the aligned files manually by saving them into axt format. The non-synonymous (Ka), synonymous (Ks) substitution rates of 80 protein coding genes and non-coding from the Verbascum species and Ka/Ks ratio of each region were calculated using Ka/Ks calculator v 2.0 with maximum likelihood methods Ma (a model that averages parameters across 14 candidate models) and Ms method (a model that has the smallest AICC among 14 candidate models) in a standard code [100]. Another method was employed to dertemine positive selective pressure within shared genes of seven species was evaluated using the PAML v4.7 [45] package implemented in EasyCodeML software. Non-synonymous (dN) and synonymous substitution (dS) substitution rates, and their ratio (ω = dN/dS) were calculated based on four site-specific models (M0 vs. M3, M1a vs. M2a, M7 vs. M8 and M8a vs. M8) with likelihood ratio test (LRT) threshold of p < 0.05 elucidating adaptation signatures within the genome. The models permit dN/dS variation within sites while keeping the ω ratio fixed within branches. Selective pressure analysis was conducted along ML tree in plain Newick format based on proteincoding sites used in the generation of phylogenetic relationship of the selected seven species. Here, individual CDS sequences were aligned in correspondence to their amino acids, and their selection was evaluated based on both ω and LRTs values.

Phylogenetic analysis
To infer the phylogenetic tree, we used 80 protein-coding genes of 87taxa in order Lamiales and two outgroups, which all but seven of our newly sequenced data were downloaded from NCBI. (Table 5), combined in a single file and aligned using MAFFT [101]. Geneious prime version 2021 was used to concatenate the sequence into various readable formats for other analyses. Performed maximum likelihood (ML) analyses using IQ tree [102], integrated with Phylosuite [103], under the best-fitting model according to Akaike Information Criterion (AIC) of all protein-coding genes under the GTR + R6 + F model for 5000 ultrafast bootstraps [104], as well as the Shimodaira-Hasegawa-like approximate likelihood-ratio test [105]. To ascertain the results, BI analysis was performed using the same data in 80 protein-coding genes of 87 taxa in order Lamiales in MrBayes 3.2.7a and using the Markov chain Monte Carlo (MCMC) method, with two independent runs for 10 million generations (Number of Chains is four). We also used second data set to confirm the taxonomic treatment of Verbascum brevipedicellatum within the family Scrophulariaceae. We used nrDNA (ITS1 + 5.8S + ITS2) of 36 taxa in the family Scrophulariaceae (Verbascum + Scrophularia + Buddleja), and two outgroups all but three of our newly sequenced data were downloaded from NCBI. (Table 6), sequence alignment was performed using MAFFT v7.308 with default parameters setting. Phylogenetic relationships reconstructions were performed based on maximum likelihood (ML) analysis using the program IQ-Tree with 1000 bootstrap replications. The best-fit model GTR + F + G4 was chosen according to Bayesian Information Criterion (BIC).

Divergence time estimation
BEAST software was used to perform the analysis. Two secondary calibration fossils and one fossil record were used to constrain the nodes; node (0) the 95% highest posterior density (HPD) the old (limit)divergence time for Lamiales (95% HPD, 67-101 Ma; Barba-Montoya et al. [77]), constraining the crown age of Lamiales (95% HPD, 85.05 Ma (Lognormal priors, 1.0 Mean (M), Sigma (S) 0.5. Constraining the most common ancestor of Verbascum (lognormal priors, Off set 65.64, M 1.0, S ¼ 0.5) [5]. The chronogram and branch intervals were linked to partitions and other constraints were unlinked. We used the Yule process prior for speciation and uncorrelated lognormal relaxed clock model. Prior's constraints time of the nodes were selected using the Log-Normal distribution of mean and standard deviation set at the mean and median limits, and the GTR + I + G substitution model was set as the nucleotide substitution model. Two calibration points were selected to estimate the divergence time of Verbascum (Scrophulariaceae): (0) the 95% highest posterior density (HPD) limit of crown age for Lamiales (